Overview

Dataset statistics

Number of variables9
Number of observations325
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory20.8 KiB
Average record size in memory65.4 B

Variable types

NUM8
BOOL1

Reproduction

Analysis started2020-05-04 10:57:35.859422
Analysis finished2020-05-04 10:57:51.354862
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
user.listed_count is highly correlated with user.followers_countHigh Correlation
user.followers_count is highly correlated with user.listed_countHigh Correlation
user.favourites_count has 20 (6.2%) zeros Zeros

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count325
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean162.0
Minimum0
Maximum324
Zeros1
Zeros (%)0.3%
Memory size2.7 KiB

Quantile statistics

Minimum0
5-th percentile16.2
Q181
median162
Q3243
95-th percentile307.8
Maximum324
Range324
Interquartile range (IQR)162

Descriptive statistics

Standard deviation93.96364545
Coefficient of variation (CV)0.5800225028
Kurtosis-1.2
Mean162
Median Absolute Deviation (MAD)81.24923077
Skewness0
Sum52650
Variance8829.166667
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 324.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
324 1 0.3%
 
120 1 0.3%
 
102 1 0.3%
 
103 1 0.3%
 
104 1 0.3%
 
105 1 0.3%
 
106 1 0.3%
 
107 1 0.3%
 
108 1 0.3%
 
109 1 0.3%
 
Other values (315) 315 96.9%
 
ValueCountFrequency (%) 
0 1 0.3%
 
1 1 0.3%
 
2 1 0.3%
 
3 1 0.3%
 
4 1 0.3%
 
ValueCountFrequency (%) 
324 1 0.3%
 
323 1 0.3%
 
322 1 0.3%
 
321 1 0.3%
 
320 1 0.3%
 

id_str
Real number (ℝ≥0)

UNIQUE
Distinct count325
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.44870414085251e+17
Minimum498280126254428160
Maximum775057555865206784
Zeros0
Zeros (%)0.0%
Memory size2.7 KiB

Quantile statistics

Minimum4.982801263e+17
5-th percentile5.002808898e+17
Q15.24948206e+17
median5.443142345e+17
Q35.53480083e+17
95-th percentile5.807802427e+17
Maximum7.750575559e+17
Range2.767774296e+17
Interquartile range (IQR)2.853187697e+16

Descriptive statistics

Standard deviation4.226597558e+16
Coefficient of variation (CV)0.07757069294
Kurtosis16.79053837
Mean5.448704141e+17
Median Absolute Deviation (MAD)2.314139912e+16
Skewness3.514571689
Sum-7.384556159e+18
Variance1.786412692e+33
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[4.98280126e+17 5.00264595e+17 5.00403940e+17 5.24923012e+17 5.24949155e+17 ... 5.80319131e+17 5.80356219e+17 5.81429591e+17 7.60628951e+17 7.75057556e+17], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
5.803398256e+17 1 0.3%
 
5.002707808e+17 1 0.3%
 
5.808823419e+17 1 0.3%
 
5.445104501e+17 1 0.3%
 
5.80371846e+17 1 0.3%
 
5.443055403e+17 1 0.3%
 
5.531606526e+17 1 0.3%
 
5.535485674e+17 1 0.3%
 
5.002846995e+17 1 0.3%
 
5.535881787e+17 1 0.3%
 
Other values (315) 315 96.9%
 
ValueCountFrequency (%) 
4.982801263e+17 1 0.3%
 
4.982936687e+17 1 0.3%
 
4.984307837e+17 1 0.3%
 
4.984868263e+17 1 0.3%
 
4.993666663e+17 1 0.3%
 
ValueCountFrequency (%) 
7.750575559e+17 1 0.3%
 
7.749910783e+17 1 0.3%
 
7.699886368e+17 1 0.3%
 
7.688597802e+17 1 0.3%
 
7.677259567e+17 1 0.3%
 

favorite_count
Real number (ℝ≥0)

Distinct count158
Unique (%)48.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean97.98461538461538
Minimum0
Maximum2123
Zeros2
Zeros (%)0.6%
Memory size2.7 KiB

Quantile statistics

Minimum0
5-th percentile6.2
Q124
median47
Q397
95-th percentile326
Maximum2123
Range2123
Interquartile range (IQR)73

Descriptive statistics

Standard deviation187.52013
Coefficient of variation (CV)1.913771149
Kurtosis58.05346288
Mean97.98461538
Median Absolute Deviation (MAD)89.36766864
Skewness6.634954761
Sum31845
Variance35163.79915
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 69.5 145.5 277. 691. 2123. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
20 11 3.4%
 
33 8 2.5%
 
28 8 2.5%
 
38 7 2.2%
 
17 7 2.2%
 
10 6 1.8%
 
45 6 1.8%
 
32 6 1.8%
 
41 6 1.8%
 
5 5 1.5%
 
Other values (148) 255 78.5%
 
ValueCountFrequency (%) 
0 2 0.6%
 
1 3 0.9%
 
2 1 0.3%
 
3 1 0.3%
 
4 4 1.2%
 
ValueCountFrequency (%) 
2123 1 0.3%
 
1635 1 0.3%
 
1114 1 0.3%
 
698 1 0.3%
 
684 1 0.3%
 

retweet_count
Real number (ℝ≥0)

Distinct count215
Unique (%)66.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean296.2246153846154
Minimum2
Maximum4388
Zeros0
Zeros (%)0.0%
Memory size2.7 KiB

Quantile statistics

Minimum2
5-th percentile26.2
Q1116
median167
Q3288
95-th percentile968.2
Maximum4388
Range4386
Interquartile range (IQR)172

Descriptive statistics

Standard deviation414.8476688
Coefficient of variation (CV)1.400449683
Kurtosis36.11915216
Mean296.2246154
Median Absolute Deviation (MAD)230.3489515
Skewness4.98018756
Sum96273
Variance172098.5883
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2.000e+00 3.250e+01 9.750e+01 1.225e+02 1.865e+02 3.075e+02 5.820e+02 1.128e+03 4.388e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
120 7 2.2%
 
101 6 1.8%
 
154 5 1.5%
 
105 5 1.5%
 
109 5 1.5%
 
116 5 1.5%
 
106 4 1.2%
 
149 4 1.2%
 
121 4 1.2%
 
114 4 1.2%
 
Other values (205) 276 84.9%
 
ValueCountFrequency (%) 
2 2 0.6%
 
3 2 0.6%
 
4 2 0.6%
 
5 1 0.3%
 
7 2 0.6%
 
ValueCountFrequency (%) 
4388 1 0.3%
 
2825 1 0.3%
 
2517 1 0.3%
 
1760 1 0.3%
 
1755 1 0.3%
 
Distinct count2
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size453.0 B
True
217
False
108
ValueCountFrequency (%) 
True 217 66.8%
 
False 108 33.2%
 

user.followers_count
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE
Distinct count325
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1682899.886153846
Minimum25
Maximum22720010
Zeros0
Zeros (%)0.0%
Memory size2.7 KiB

Quantile statistics

Minimum25
5-th percentile1492.6
Q118359
median133561
Q31208620
95-th percentile8384436.8
Maximum22720010
Range22719985
Interquartile range (IQR)1190261

Descriptive statistics

Standard deviation3775090.367
Coefficient of variation (CV)2.243205551
Kurtosis12.84944401
Mean1682899.886
Median Absolute Deviation (MAD)2271979.384
Skewness3.432608965
Sum546942463
Variance1.425130728e+13
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2.5000000e+01 8.0345000e+03 4.5580500e+04 2.0201250e+05 3.6586050e+05 ... 2.0351740e+06 4.6361720e+06 4.6368350e+06 5.7086095e+06 2.2720010e+07], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
306687 1 0.3%
 
61134 1 0.3%
 
101550 1 0.3%
 
66735 1 0.3%
 
842416 1 0.3%
 
101553 1 0.3%
 
34994 1 0.3%
 
39603 1 0.3%
 
5303 1 0.3%
 
79481 1 0.3%
 
Other values (315) 315 96.9%
 
ValueCountFrequency (%) 
25 1 0.3%
 
220 1 0.3%
 
253 1 0.3%
 
286 1 0.3%
 
345 1 0.3%
 
ValueCountFrequency (%) 
22720010 1 0.3%
 
22667902 1 0.3%
 
20041552 1 0.3%
 
20040953 1 0.3%
 
20038478 1 0.3%
 

user.listed_count
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count296
Unique (%)91.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17914.523076923077
Minimum0
Maximum160927
Zeros2
Zeros (%)0.6%
Memory size2.7 KiB

Quantile statistics

Minimum0
5-th percentile40
Q1306
median2204
Q313598
95-th percentile101784.4
Maximum160927
Range160927
Interquartile range (IQR)13292

Descriptive statistics

Standard deviation34174.52194
Coefficient of variation (CV)1.907643413
Kurtosis5.53278448
Mean17914.52308
Median Absolute Deviation (MAD)23033.64251
Skewness2.453615997
Sum5822220
Variance1167897950
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 1.165000e+02 3.595000e+02 1.653000e+03 2.899000e+03 ... 4.443600e+04 4.443900e+04 1.215780e+05 1.587405e+05 1.609270e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
40 4 1.2%
 
5313 3 0.9%
 
812 3 0.9%
 
0 2 0.6%
 
170 2 0.6%
 
84 2 0.6%
 
13365 2 0.6%
 
24938 2 0.6%
 
1075 2 0.6%
 
5273 2 0.6%
 
Other values (286) 301 92.6%
 
ValueCountFrequency (%) 
0 2 0.6%
 
1 2 0.6%
 
6 1 0.3%
 
9 1 0.3%
 
14 1 0.3%
 
ValueCountFrequency (%) 
160927 1 0.3%
 
160810 1 0.3%
 
158785 1 0.3%
 
158767 1 0.3%
 
158766 1 0.3%
 

user.friends_count
Real number (ℝ≥0)

Distinct count237
Unique (%)72.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4176.15076923077
Minimum3
Maximum109492
Zeros0
Zeros (%)0.0%
Memory size2.7 KiB

Quantile statistics

Minimum3
5-th percentile17
Q1308
median525
Q31733
95-th percentile18781.6
Maximum109492
Range109489
Interquartile range (IQR)1425

Descriptive statistics

Standard deviation13682.8415
Coefficient of variation (CV)3.276424212
Kurtosis29.06326193
Mean4176.150769
Median Absolute Deviation (MAD)5854.606258
Skewness5.247521827
Sum1357249
Variance187220151.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3.00000e+00 4.50000e+00 2.62500e+02 2.66500e+02 3.78500e+02 ... 1.04950e+03 2.88950e+03 2.89250e+03 1.16580e+04 1.09492e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3 11 3.4%
 
460 8 2.5%
 
264 8 2.5%
 
382 5 1.5%
 
113 5 1.5%
 
136 4 1.2%
 
308 4 1.2%
 
389 4 1.2%
 
17 4 1.2%
 
7352 4 1.2%
 
Other values (227) 268 82.5%
 
ValueCountFrequency (%) 
3 11 3.4%
 
6 3 0.9%
 
15 1 0.3%
 
17 4 1.2%
 
24 1 0.3%
 
ValueCountFrequency (%) 
109492 1 0.3%
 
86003 1 0.3%
 
85683 2 0.6%
 
84767 1 0.3%
 
72382 1 0.3%
 

user.favourites_count
Real number (ℝ≥0)

ZEROS
Distinct count195
Unique (%)60.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3642.6153846153848
Minimum0
Maximum152373
Zeros20
Zeros (%)6.2%
Memory size2.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q115
median208
Q31066
95-th percentile19541.2
Maximum152373
Range152373
Interquartile range (IQR)1051

Descriptive statistics

Standard deviation14332.52618
Coefficient of variation (CV)3.934680076
Kurtosis67.73842638
Mean3642.615385
Median Absolute Deviation (MAD)5500.018935
Skewness7.656073368
Sum1183850
Variance205421306.7
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 2.50000e+00 2.75000e+01 9.15000e+01 7.25000e+02 2.42700e+03 5.10500e+03 2.63330e+04 1.52373e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 20 6.2%
 
2 12 3.7%
 
17 9 2.8%
 
26 8 2.5%
 
8 8 2.5%
 
9 7 2.2%
 
13 7 2.2%
 
5 6 1.8%
 
1 6 1.8%
 
208 5 1.5%
 
Other values (185) 237 72.9%
 
ValueCountFrequency (%) 
0 20 6.2%
 
1 6 1.8%
 
2 12 3.7%
 
3 3 0.9%
 
5 6 1.8%
 
ValueCountFrequency (%) 
152373 1 0.3%
 
138575 1 0.3%
 
106348 1 0.3%
 
50154 1 0.3%
 
49458 1 0.3%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

Unnamed: 0id_strfavorite_countretweet_countuser.verifieduser.followers_countuser.listed_countuser.friends_countuser.favourites_count
005296602960809164804711False3135224521287623
11529653029747064832712False5018442691
2252968741061172838487True4468610304122
335297134671846768643731True20646681399636253103
4452968967941181030453False345130811354
5552971645379295641612False5959712492249
66529695367680761856912True3205414221513562
7752969548366166425748False25016763
88529739968470867968826True665484978219782190
9952954073302040576073False3664404745

Last rows

Unnamed: 0id_strfavorite_countretweet_countuser.verifieduser.followers_countuser.listed_countuser.friends_countuser.favourites_count
31531550027804559736832050195True722151123594131
3163165763230868883619841342True46401174827090
31731757679643273007104025False50114251034204
3183185768292629274132483668False209684961960522102
31931957627694764840550544False1485639721954
3203205763198328005550082669False1361791514388587
32132157675517453186252904False144715941117538
3223225765134637381099542053True93743083182351488
32332357681299841893990402False13722110976608
32432457725831794214912047False11502126258048986